---
title: JSON mode
description: Extract structured data from web pages using LLMs with JSON schema validation.
icon: Database
---

## Introduction

JSON mode enables you to extract structured data from any webpage using Large Language Models (LLMs).

AnyCrawl allows you to define a JSON schema and prompts to guide the extraction process, ensuring the output matches your desired format.

## Usage Examples

JSON mode supports three structured extraction modes:

1. **Prompt Only**: Use a natural language prompt to describe what you want to extract. The LLM determines the output structure.
2. **Schema Only**: Define a JSON schema to strictly constrain the output structure. The LLM fills in the content.
3. **Prompt + Schema**: Combine both schema and prompt for precise structure and content guidance.

---

### 1. Prompt Only (Flexible Structure)

**Steps:**

1. Set the target webpage URL
2. Write a prompt describing the information to extract
3. Send the request

```bash tab="cURL"
curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "user_prompt": "Extract the company mission, open source status, and employee count."
    }
  }'
```

```javascript tab="JavaScript"
const response = await fetch("https://api.anycrawl.dev/v1/scrape", {
    method: "POST",
    headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    body: JSON.stringify({
        url: "https://example.com",
        json_options: {
            user_prompt: "Extract the company mission, open source status, and employee count.",
        },
    }),
});
const result = await response.json();
console.log(result.data.json);
```

```python tab="Python"
import requests

response = requests.post(
    'https://api.anycrawl.dev/v1/scrape',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'url': 'https://example.com',
        'json_options': {
            'user_prompt': 'Extract the company mission, open source status, and employee count.'
        }
    }
)
print(response.json()['data']['json'])
```

---

### 2. Schema Only (Strict Structure)

**Steps:**

1. Set the target webpage URL
2. Define a JSON schema for the desired fields and types
3. Send the request

```bash tab="cURL"
curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      }
    }
  }'
```

```javascript tab="JavaScript"
const response = await fetch("https://api.anycrawl.dev/v1/scrape", {
    method: "POST",
    headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    body: JSON.stringify({
        url: "https://example.com",
        json_options: {
            schema: {
                type: "object",
                properties: {
                    company_mission: { type: "string" },
                    is_open_source: { type: "boolean" },
                    employee_count: { type: "number" },
                },
                required: ["company_mission"],
            },
        },
    }),
});
const result = await response.json();
console.log(result.data.json);
```

```python tab="Python"
schema = {
    "type": "object",
    "properties": {
        "company_mission": {"type": "string"},
        "is_open_source": {"type": "boolean"},
        "employee_count": {"type": "number"}
    },
    "required": ["company_mission"]
}

import requests

response = requests.post(
    'https://api.anycrawl.dev/v1/scrape',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'url': 'https://example.com',
        'json_options': {
            'schema': schema
        }
    }
)
print(response.json()['data']['json'])
```

---

### 3. Prompt + Schema (Guided Structure & Content)

**Steps:**

1. Set the target webpage URL
2. Define a JSON schema for the output structure
3. Write a prompt to guide the extraction
4. Send the request

```bash tab="cURL"
curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer <your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      },
      "user_prompt": "Extract the company mission (string), open source status (boolean), and employee count (number)."
    }
  }'
```

```javascript tab="JavaScript"
const response = await fetch("https://api.anycrawl.dev/v1/scrape", {
    method: "POST",
    headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    body: JSON.stringify({
        url: "https://example.com",
        json_options: {
            schema: {
                type: "object",
                properties: {
                    company_mission: { type: "string" },
                    is_open_source: { type: "boolean" },
                    employee_count: { type: "number" },
                },
                required: ["company_mission"],
            },
            user_prompt:
                "Extract the company mission (string), open source status (boolean), and employee count (number).",
        },
    }),
});
const result = await response.json();
console.log(result.data.json);
```

```python tab="Python"
schema = {
    "type": "object",
    "properties": {
        "company_mission": {"type": "string"},
        "is_open_source": {"type": "boolean"},
        "employee_count": {"type": "number"}
    },
    "required": ["company_mission"]
}

import requests

response = requests.post(
    'https://api.anycrawl.dev/v1/scrape',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'url': 'https://example.com',
        'json_options': {
            'schema': schema,
            'user_prompt': 'Extract the company mission (string), open source status (boolean), and employee count (number).'
        }
    }
)
print(response.json()['data']['json'])
```

## JSON options object

The `json_options` parameter is an object that accepts the following parameters:

- `schema`: The schema to use for the extraction.
- `user_prompt`: The user prompt to use for the extraction.
- `schema_name`: Optional name of the output that should be generated.
- `schema_description`: Optional description of the output that should be generated.

### JSON Schema

The schema to use for the extraction. Common fields include:

- `type`: The type of the value, supported types show below.
- `properties`: (for `object`) An object specifying the fields and their schemas.
- `required`: (for `object`) An array of required field names.

#### Supported Types

| Type      | Description                | Example                                 |
| --------- | -------------------------- | --------------------------------------- |
| `string`  | Text data                  | Company name, descriptions, titles      |
| `number`  | Numeric values             | Prices, quantities, ratings             |
| `boolean` | True/false values          | Availability flags, feature indicators  |
| `object`  | Nested structures          | Address details, product specifications |
| `array`   | Lists of values or objects | Tags, categories, feature lists         |

### Schema Example

```json
{
    "type": "object",
    "properties": {
        "company_name": {
            "type": "string"
        },
        "is_open_source": {
            "type": "object",
            "properties": {
                "answer": {
                    "type": "string"
                },
                "value": {
                    "type": "boolean"
                }
            }
        },
        "employee_count": {
            "type": "number"
        }
    },
    "required": ["company_name"]
}
```
